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Abstract 

The statistical aspects of determinantal point processes (DPPs) seem largely 
unexplored. We review the appealing properties of DDPs, demonstrate that 
they are useful models for repulsiveness, detail a simulation procedure, and 
provide freely available software for simulation and statistical inference. We 
pay special attention to stationary DPPs, where we give a simple condition 
ensuring their existence, construct parametric models, describe how they can 
be well approximated so that the likelihood can be evaluated and realizations 
can be simulated, and discuss how statistical inference is conducted using the 
likelihood or moment properties. 

Keywords: maximum likelihood based inference, point process density, prod- 
uct densities, simulation, repulsiveness, spectral approach. 



1 Introduction 

Determinantal point processes (DPPs) are largely unexplored in statistics, though 
they possess a number of very attractive properties and have been studied in math- 
ematical physics, combinatorics, and random matrix theory even before the general 



notion was introduced in Macchi (1975). They have been used to model fermions in 



quantum mechanics, in classical Ginibre and circular unitary ensembles from random 
matrix theory, for examples arising from non-intersecting random walks and random 



spanning trees, and much more, see Section 2 in Soshnikov (2000) and Section 4.3 
in Hough et al. (2009). They can be defined on a locally compact space, where the 
two most important cases are the <i-dimensional Euclidean space IR d and a discrete 



state space. Hough et al. (2009) provides an excellent survey on the probabilistic 



aspects of DPPs. 

The focus in the present paper is on statistical aspects for DPPs defined on a 
Borel set B C M d , with d = 2 and B = R 2 in most of our examples, and with 



*An alphabetical ordering has been used since all authors have made significant contributions 
to the paper. 
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its distribution specified by a continuous complex covariance function C denned on 
B x B and which is properly scaled (these regularity conditions on C are imposed to 



ensure existence of the process as discussed in Section 2.2). DPPs possess a number 
of appealing probabilistic properties: 

(a) By the very definition, all orders of moments of a DPP defined on B are 
described by certain determinants of matrices with entries given by C (Sec- 
tion 



2.1). 



(b) The restriction of such a determinantal point process to a Borel subset A of B 
is also a DPP with its distribution specified by the restriction of C to A x A. 

(c) A one-to-one smooth transformation or an independent thinning of a DPP is 



also a DPP (Section 2.1). 



(d) A DPP can easily be simulated, since it is a mixture of 'determinantal projec- 
tion point processes' (Section [3l . 



(e) A DPP restricted to a compact set has a density (with respect to a Poisson pro- 
cess) which is proportional to the determinant of a matrix with entries given 
by a complex covariance function C, where C is obtained by a simple trans- 
formation of the eigenvalues in a spectral representation of C, and where the 
normalizing constant of the density has a closed form expression (Section [4]) . 

From a statistical perspective, due to (a)-(e), modelling and estimation for DPPs 
become tractable. Moreover, we demonstrate that DPPs are useful models for repul- 
sive interaction. The usual class of point processes used for modeling repulsiveness 
is the class of Gibbs point processes (see M0ller and Waagepetersen (2004, 2007) 



and the references therein). For a general Gibbs point process, the moments are not 
expressible in closed form, the density involves an intractable normalizing constant, 
and rather elaborate Markov chain Monte Carlo methods are needed for simulations 
and approximate likelihood inference. 

Figure [T] shows realizations in the unit square of two stationary DPPs with 
intensity p = 100. They are defined by a Gaussian covariance function with scale 
parameter (a) a = 0.05 and (b) a = 0.01 (see Section 5.2 for details). Compared to 



realizations of a homogeneous Poisson process, the point patterns in Figure [T] look 
more regular, and the regularity becomes more pronounced as a increases. 

Generalizations of DPPs to weighted DPPs, which also are models for repulsion, 
and to the closely related permanental and weighted permanental point processes, 



which are models for attraction, are studied in Shirai and Takahashi (2003) and Me 



Cullagh and M0ller (2006). Since determinants have a geometric meaning, are mul- 



tiplicative, and there are algorithms for fast computations, DPPs are much easier to 
deal with, not at least from a statistical and computational perspective. Section [5\3 
discusses a useful approximation of C using a Fourier basis approach; this applies 
as well for weighted DPPs and weighted permanental point processes. 

The paper is organized as follows. Section [2] defines DPPs and studies existence 
and probabilistic properties of them. Section [3] describes a general simulation proce- 
dure for DPPs. Section [4] discusses density expressions for DPPs. Section [5] studies 
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(a) 



(b) 



Figure 1: Realizations of two Gaussian DPP models as described in Section 5J2 with 
intensity p = 100 and scale (a) a = 0.05 and (b) a = 0.01. 



stationary DPPs and how to approximate them. Section [6] discusses statistical in- 
ference for parametric models of DPPs. Appendices |A|fD] contain proofs of some of 
the results in the paper and provide supplementary methods and examples to the 
ones presented in the main text. 



The statistical analyzes in this paper have been conducted with R (R Develop- 



ment Core Team, 2011). The software we have developed is freely available as a 



supplement to the spatstat library (Baddeley and Turner, 2005) enabling users to 



both simulate and fit parametric models of stationary DPP models. 



2 Basics 



Section |2.1| discusses the definition of DPPs, while Section |2.2| discusses their exis- 
tence. 



2.1 Definition 



Let B C W 1 be a Borel set. Consider a simple locally finite spatial point process X on 
B, i.e. we can view X as a random locally finite subset of B; for measure theoretical 
details, see e.g. M0ller and Waagepetersen (2004) and the references therein. We 



refer to the elements (or points) of X as events. The following basic notions are 
needed before defining when X is a DPP. 

Recall that for an integer n > 0, X has n'th order product density function 
p(n) . _^ ^ ^ ^ g f unc ti on i s locally integrable (with respect to Lebesgue 
measure restricted to B) and for any Borel function h : B n — > [0, oo), 



E 



XI 



^2 h ( xi > 

...,i»ex 



p {n \x 1 ,.. .,x n )h(x 1 , 



X Ti 



dxi---dx n (2.T 



B 
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where 7^ over the summation sign means that x\, . . . , x n are pairwise distinct events. 



See e.g. Stoyan et al. (1995). Intuitively, for any pairwise distinct points X\, . . . , x n G 
B, p( n \xi, . . . , x n ) dx\ ■ ■ ■ dx n is the probability that for each i — 1, ■ • • , n, X has 
a point in an infinitesimally small region around Xi of volume dxi. Clearly, 
only uniquely defined up to a Lebesgue nullset. We shall henceforth require that 
p( n \xi, . . . = if Xi = Xj for some i 7^ j. This convention becomes consistent 
with Definition 12. II below. 

In particular, p = p^ is the intensity function and g(x, y) = p^^x, y)/[p(x)p(y)] 
is the pair correlation function, where we set g{x,y) = if p(x) or p(y) is zero. By 
our convention above, g(x,x) = for all x G B. The terminology 'pair correlation 
function' may be confusing, but it is commonly used by spatial statisticians. In 
fact, for disjoint bounded Borel sets Ai,A2 C lR d , if N(A) denotes the number of 
events falling in A, then the covariance between N(Ai) and N^A?) is the integral 
over Ai x A 2 of the covariance function given by c(x, y) = p(x)p(y)(g(x, y) — 1) for 
x 7^ y. For a Poisson point process with an intensity function p, and for x 7^ y, we 
have c(x,y) = 0, while g(x,y) = 1 if p(x) > and p(y) > 0. In spatial statistics 
and stochastic geometry, g is more commonly used than c, and we shall also pay 
attention to g. 

Let C denote the complex plane. For a complex number z = z\ + \Z2 (where 
zi,z 2 G K and i = y/— 1), we denote z = Z\ — iz 2 the complex conjugate and 
\z\ = y z\ + z\ the modulus. 

For any function C : B x B — > C, let [C](xi, . . . ,x n ) be the n x n matrix 
with (z,j)'th entry C(xi,Xj). For a square complex matrix A, let det A denote its 
determinant. 

Definition 2.1. Suppose that a simple locally finite spatial point process X on B 
has product density functions 

pV>(x 1 ,...,x n )=det[C\(x 1 ,...,x n ), (xi, . . . , x n ) G B n , n = l,2,.... (2.2) 

Then X is called a determinantal point process (DPP) with kernel C , and we write 
X ~ DPP B (C). 

Note that a Poisson process is the special case where C(x, y) = whenever x 7^ y. 

Remark 2.2. The focus in this paper is mostly on the case B = M. d and sometimes 
on the case B = S where S is compact. 

For X ~ DPP B (C) and any Borel set A C B, define X A = X n A and denote 
its distribution by DPPs(C; A). We also write DPP a{C) for the distribution of the 
DPP on A with kernel given by the restriction of C to A x A. Then property (b) in 
Section [j] follows directly from Definition|2~Tj i.e. DPP j4 (C) = DPP B (C; A). Further, 



when B = R d } we write DPP(C) for DPP Rd (C), and DPP(C; A) for DPP Rd (C; A). 

Before discussing conditions on C ensuring the existence of DPPs, we notice the 
following properties. 



Suppose X ~ DPP(C). Then there is no other point process satisfying (2.2) 



(Lemma 4.2.6 in Hough et al. (2009)). By (2.2), the intensity function of X is 

p(x) = C(x,x), x G M. d , (2.3) 
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and the pair correlation function of X is 

C(x,y)C(y,x) 



1 - 



if C(x, x) > and C(y, y) > 



C(x,x)C(y,y) 

while it is zero otherwise. If C is Hermitian, then g < 1, showing that the events 
of X repel each other. Furthermore, if C is continuous, p (n) is also continuous and 
p <yn \x\, . . . , x n ) tends to zero as the Euclidean distance \\xi— Xj\\ goes to zero for some 
i 7^ j, cf. ( 2.2[ ). This once again reflects the repulsiveness of a DPP (repulsiveness 
is also discussed in Remark 4.3). 

For later use, let R(x,y) = C(x,y)/[C(x,x)C(y,y)] 1 ^ 2 , where we set R(x,y) = 
if C(x, x) = or C(y, y) = 0. When C is a covariance function, R is its corresponding 
correlation function and 



g(x,y) 



\R(x, 



x,y G 



(2.4) 



The following propositions concern smooth transformations and independent 
thinning of DPPs. 

Proposition 2.3. Let U C R d be an open set and T : IR d — > U a continuous 
differentiable bisection such that its inverse T _1 has a non-zero Jacobian determinant 
J T -x(x) for all x e U. If X 1 ~ DPP{C X ) and X 2 = T(X 1 ), then X 2 ~ DPPu(C 2 ) 
with 



Jt-x (x) I (T- 1 (x) , T" 1 (y ) ) I Jr-a (y ) | ^ . 



C 2 (x,y) 

Proof. Follows immediately from (2.1) and (2.2) 
Proposition 2.4. If Xi 



(2.5) 

□ 



DPP(C\) and X 2 is obtained as an independent thinning 
of Xi with retention probabilities p(x) , x G M d , then X 2 ~ DPP[C 2 ) with C 2 (x,y) = 

Proof. Let U = {U (x) : x G M d } be a random field of independent Bernoulli variables 
where P(U(x) = 1) = p(x) and U is independent of X\. Then X 2 is distributed as 
{x e X l :U(x 



1}, so from (2.1) and (2.2) it is clear that X 2 ~ DPP(C 2 ) 



□ 



2.2 Existence 



Existence of a DPP on Mr is ensured by the following assumptions on C, where 
Scl d denotes a generic compact set. 
Let C : 



x 



— > C be Hermitian, i.e. C(x,y) = C(y,x) for all x,y £ 



For convenience, assume that C is continuous. Denote L 2 (S) the space of square- 
integrable functions h : S — > C, and define the integral operator T$ : L 2 (S) — >■ L 2 (S) 
by 

T s (h)(x)= [ C(x,y)h(y)dy, x e S. (2.6) 

By Mercer's theorem (see e.g. Section 98 in Riesz and Sz.-Nagy (1990)), for any 
compact set S 1 C M d , C restricted to 5 x S has a spectral representation, 



G ( x > V) = ^2 X k<fik(x)(j)k(y), (x, y) £ S x S, 



(2.7) 



fc=i 



with absolute and uniform convergence of the series, and where 
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the set of eigenvalues {A&} is unique, each non-zero eigenvalue is real and has 
finite multiplicity, and the only possible accumulation point of the eigenvalues 
is 0; 



the eigenfunctions {4>k} form an orthonormal basis of L 2 (S), i.e. 

1 if k = I, 
if jfe ^ /, 



Ts{(f>k) = Afc^fc, / 4> k (x)4>i(x) dx 



{21 



and any h G L 2 (S) can be written as h = Ylk ) =i a k ( Pk, where a>k G C, k = 
1, 2, . . .. Moreover, (fik is continuous if ^ 0. 

When we need to stress that the eigenvalue A& depends on S, we write Af . We say 
that C (or Tg) is of local trace class if trg(C) = f s C(x,x) dx is finite, i.e. 



trg(C) = |Af | < oo for all compact S C 



(2.9) 



fc=i 



Finally, we introduce the following conditions (CI) and (C2), recalling that C is a 
complex covariance function if and only if it is Hermitian and non-negative definite: 



(CI) C is a continuous complex covariance function; 
(C2) Af < 1 for all compact S cR d and all k. 

Theorem 2.5. Under (CI), existence of DPP(C) is equivalent to (C2). 

Proof. A slightly different result where the eigenvalues are strictly less than one was 



first given in Theorem 12 of Macchi (1975). We apply Theorem 4.5.5 in Hough 
et al. | ( |2009| , where C : R d x R d -> C is Hermitian, locally square integrable, 



of local trace class, and, as (2.7) may not hold on a Lebesgue nullset, that C is 



simply given by (2.7); then existence of DPP(C) is equivalent to that the spectrum 
of T s is contained in [0, 1] for all compact S C R d (i.e. < Af < 1, k = 1, 2, . . .). 
When C is continuous, this nullset vanishes and local square integrability is satisfied. 
When C is Hermitian and non-negative definite, the eigenvalues are non-negative, 
and so continuity of C implies the local trace class assumption, since the trace 
J2T=i l-^fl = SfcLi = Is^( x ' x ) d x ^ s finite. Thereby Theorem 



2.5 



follows. 



□ 



Usually, for statistical models of covariance functions, (CI) is satisfied, and so 



(C2) becomes the essential condition. As discussed in Section 5.1| (C2) simplifies in 
the stationary case of X. 



Assumption 2.6. In the remainder of this paper, X 
the conditions (CI) and (C2). 



DPP(C) with C satisfying 



Remark 2.7. Various comments on (CI) and (C2) are in order. 



As noticed in Hough et al. (2009), there are interesting examples of DPPs with 



non-Hermitian kernels, but they do not possess various general properties, and the 



results and methods in our paper rely much on the spectral representation (2.7). 
We therefore confine ourselves to the Hermitian case of C . 
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Theorem 4.5.5 in Hough et al. (2009) deals with a more general setting for the 



existence of DPPs, where C satisfies the technical conditions given in our proof of 
Theorem |2.5[ Then C is not assumed to be continuous, while non-negative definite- 



ness of C becomes a necessary condition for existence of DPP(C), and so C has to be 
a covariance function. However, we find that (CI) is a natural condition for several 
reasons. First, statisticians are used to deal with covariance functions. Second, as 



seen in the proof of Theorem 2J3 the situation simplifies when C is assumed to be 
continuous. Third, continuity of C implies continuity of the intensity function and 
the pair correlation function. Conversely, if C is real and non-negative, continuity 
of p and g implies continuity of C. All in all, we therefore confine ourselves to the 
case of (CI). 

Though the Poisson process is determinantal from Definition 2.1 (taking C(x, y) = 
if x ytz y) ) it is neither covered by our approach nor by that in Hough et al. (2009). 



It actually corresponds to a limiting case of integral operators, where the limit is 
the multiplication operator Q given by Q(h)(x) = C(x,x)h(x). 



Remark 2.8. In continuation of Remark |2.2[ when we are only interested in consid- 
ering a DPP Y on a given compact set S C M> d , then (C1)-(C2) can be replaced by 
the assumption that C is a continuous complex covariance function defined on S x S 
such that Af < 1 for all k. The results in Sections [3]j4] are then valid for Y, even if 
there is no continuous extension of C to IR d x IR d which satisfies (C1)-(C2). However, 
it is convenient to assume (C1)-(C2) as we in Sections [5]{6] consider stationary DPPs. 



Remark 2.9. Recall that for any simple locally finite spatial point process Y on M. d 
with intensity function p, there exist unique so-called reduced Palm distributions P^, 
for Lebesgue almost all x G W 1 with p(x) > 0, and the reduced Palm distributions 
are determined by that 



E^h(x,Y\ {x}) = I I p(x)h(x,x) dP l x (x) dx 



(2.10) 



for any non-negative Borel function h, where x denotes a locally finite subset of ' 



See e.g. Stoyan et al. (1995) and Appendix C.2 in M0ller and Waagepetersen (2004). 



Intuitively, P x is the conditional distribution of Y \{x} given that Y has an event at 
x. When all n'th order product density functions p^ of Y exist, n — 1, 2, . . ., then 
for Lebesgue almost all x G M. d with p(x) > 0, P^ has n'th order product density 
function 

pW(x ls ...,x n ) = p^ n+1 \x, x x , . . . , x n )/p(x) (2.11) 
and otherwise we can take p^ 



, . . . , x n 



Takahashi (2003). 



Shirai and 



0. See e.g. Lemma 6.4 in 
For our DPP X, using (|2.2[) it can be shown that for all i6l d with C(x, x) > 



we can take P^ = DPP(C- ) where 

Cl(u,v) = det[C](u,x;v,x)/C(x,x), u,v G R d , 

and where [C](a;i, X2] yi, 2/2) is the 2x2 matrix with entries C(xi, yj), i,j = 1, 2. See 
Theorem 6.5 in |Shirai and Takahashi ( 2003[ ) (where their condition A is implied by 
the conditions in our Theorem 2.4). Moreover, (2.11) holds whenever C(x,x) > 0. 
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Let S C M. d be compact. A special simple case of X$ occurs when all non- 
zero eigenvalues Af are one, or equivalently, T5 is a projection. Then we call the 
restriction of C to S x S a projection kernel. McCullagh and M0ller (2006) calls 
then Xs a special DPP, but in the present paper we use a more commonly used 



terminology (e.g. Hough et al. (2006) and Hough et al. ( 2009[ )). 



Definition 2.10. When C restricted to S x S is a projection kernel, Xs is called a 
determinantal projection point process. 



3 Simulation 



A general algorithm for simulating a finite DPP is provided in Hough et al. (2006), 



including a proof of its validity in a very general setup. While it is the only known 
general simulation procedure for DPPs, there are special cases which may be sim- 



ulated in a different manner, e.g. the Ginibre ensemble, see Section 4.3 in Hough 



et al. (2009). 



We explain and prove the general simulation algorithm in the specific case where 
we want to simulate Xs ~ DPP(C; S) with Sci d compact. Our implementation 



of the general algorithm becomes more efficient than the one in |Scardicchio et aL 
(2009), and our exposition uses mainly linear algebra and is less technical than the 



exposition in Hough et al. (2006). 



Consider the spectral representation (2.7) of C restricted to S x S. The general 



simulation algorithm comes from the following result proved in Theorem 7 in Hough 



et al. (2006); see also Theorem 4.5.3 in Hough et al. (2009). 



Theorem 3.1. For k = 1,2,..., let B k be independent Bernoulli variables with 
mean X k . Define the random projection kernel K : S x S — )■ C by 



K{x,y) = ^B k (j) k (x)(j) k (y). 



k=l 



Then 



DPP S (K) ~ DPP(C; S) 



(3.1) 



(3.2) 



in the sense that if we first generate the independent Bernoulli variables, and second 
generate a determinantal projection point process on S with kernel K , then the 
resulting point process follows DPP(C; S). 

Note that if N(S) = n(Xs) denotes the number of events in S, then 

00 00 00 

N(S)~J2B k , E[N(S)] = J2^, Wai[N(3)]=^2\ k (l-X k ). (3.3) 



k=l 



k=l 



k=l 



The first result in (3.3 ) follows from (3.2 ) and Theorem 3.2 below (or from Lemma 4.4.1 
in 



Hough et al. ( 2009[ )), and the first result immediately implies the two other results. 



In the following two sections, a two step simulation procedure based on Theo- 
rem EH is described in detail. 



S 



3.1 Simulation of Bernoulli variables 



We start by describing how the independent Bernoulli variables B\, B2, . . . can be 
simulated. 

Recall that P(B k = 1) = 1 - P(B k = 0) = X k , k = 1, 2, . . .. Define B = A = 1. 
With probability one, Yl B k < 00, since ^ A& < 00. Consequently, with probability 
one, the random variable M = max{A; > : B k 7^ 0} is finite. For any integer 
m > 0, it is easily verified that -Bo, • • • , B m _i are independent of the event {M = m}. 
Therefore the strategy is first to generate a realization m of M, second independently 
generate realizations of the Bernoulli variables B k for k = 1, . . . , m — 1 (if m = we 

do nothing), and third set S m = 1 and 5^ = for k — m + 1, m + 2, Simulation 

of these Bernoulli variables is of course easily done. For simulation of M, we use 
the inversion method described below. 

For m = 0,1,2,..., let 



Pn 



P(M = m) = \ m \[(l-\ i ). 



Note that ml = sup{A; > : = 1} is finite, and p m = whenever m < m'. For 
m > m', computation of the p m 's may be done via the recursion 



Pm' = TT (1-Afc), Pm+1 — T — 7^~~~T ^Pm, TTl = TTl' , Ul' + 1, . . . 

, 1 A m 1 — A m+ i) 

k=m'+i 



The calculation of p m > may involve numerical methods. 

Let F denote the distribution function of M and introduce 



q m = F{m) = P(M < m) = ^p k . 



k=0 



The inversion method is based on the fact that F (U) = min{m : q m > U} is 
distributed as M if U is uniformly distributed on (0, 1). 



3.2 Simulation of determinantal projection point process 



Suppose we have generated a realization of the Bernoulli variables B k as described 



in Section 3.1 and we now want to generate a realization from DPPs(.fr) with K 
given by (3.1 ). 

Let n = XlfcLi Bk denote the number of non-zero B k s with k > 1 (as foreshad- 



owed in connection to (3.3), n can be considered as a realization of the count N(S)). 
If n = 0, then K = and a realization from DPP5(i^) is simply equal to the empty 
point configuration. Assume that n > and without loss of generality that 



K(x,y) = ^2<j) k { x )<Pk{y) = v(y)*v{ 



(3.4) 



k=l 



where v(x) = (0i(x), . . . , n (x)) T , and where T and * denote the transpose and 
conjugate transpose of a vector or a matrix. For n-dimensional complex column 
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vectors such as v(x) and v(y), we consider their usual inner product (v(x), v(y)) = 
v(y)*v(x). 



Algorithm 1 Simulation of determinantal projection point process 

sample X n from the distribution with density p n (x) = \\v(x)\\ 2 /n, x G S 
sete 1 = v(X n )/\\v(X n )\\ 
for % = (n — 1) to 1 do 

sample X,i from the distribution with density 



Pi(x) 



\v(x)\\ 2 -J2\e*v(x)\' 

i=i 



x e S 



(3.5) 



set Wf = v(Xi) - YTjJi { e *j v ( x i)) e j> e n _ i+1 = Wi/\\wi 
end for 

return {X x , . . . ,X n } 



Theorem 3.2. If n > and K(x,y) = Y^k=i 4>k(x)4>k(y) for all x,y G S, then 
{Xi, . . . ,X n } generated by Algorithm^ is distributed as DPP S (K). 

Proof. See Appendix A. □ 

Remark 3.3. Let n > 0, and define H n = {0} and for i = n — 1, . . . , 1, 

Hi = span c {v(X n ), . . . , v(X i+ i)} = I ^ a j v ( x j) '■ a j 6 C \ ■ ( 3 - 6 ) 

[j=i+l J 

With probability one, v(X n ), . . . ,v(Xi) are linearly independent, cf. Appendix A. 
Thus, almost surely, Hi is a subspace of C n of dimension n — i. For i = n — 
1, . . . , 1, by the Gram-Schmidt procedure employed in Algorithm [TJ ex, ... , e n -i is 
an orthonormal basis of Hi. Further, for i = n, . . . , 1, ipi(x) is the square norm 
of the orthogonal projection of v(x) onto H^ (the orthogonal complement to Hi). 
Moreover, as verified in Appendix A, with probability one, Pi(x) becomes a density, 
where for i < n we are conditioning on (X n , . . . , X i+ i). 

Remark 3.4. According to the previous remark, 

t Pl (x) = \\Piv(x)\\ 2 (3.7) 

where Pi is the matrix of the orthogonal projection from C n onto H[~. Denoting by 
I n the n x n identity matrix, we have for i < n, 

k=n N 

This provides an alternative way to calculate the density Pi(x), where Pi is obtained 
recursively. This idea was used in Scardicchio et al. (2009) but, as noticed there, the 
successive multiplication of matrices leads to numerical instabilities. Some correc- 
tions must then be applied at each step to make Pi a proper projection matrix when 
n — i is large. In contrast, the calculation of Pi(x) in Algorithm [l] is straightforward 
and numerically stable. 
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Remark 3.5. To implement Algorithm [T] we need a way to sample from the densities 
Pi, i = n, . . . , 1. This may simply be done by rejection sampling with a uniform 
instrumental density and acceptance probability Pi{x)J swp yeS Pi(y). Note that for x 
such that v(x) G H^~, Pi(x) = \\v(x)\\ 2 /i. Thus for small values of i, simulation of X, 
by rejection sampling with respect to a uniform density may be inefficient. However, 
the computation of Pi(x) is fast so this is not a major drawback in practice. For the 
examples in this paper, we have just been using rejection sampling with a uniform 
instrumental distribution. Appendix B discusses other choices of the instrumental 
distribution. 



4 Densities 

This section briefly discusses some useful density expressions for Xs ~ DPP(C; S) 
when S C M. d is compact. Recall that the eigenvalues Xk = Af are less than or equal 
to one. 

In general, when some eigenvalues A^ are allowed to be one, the density of Xs is 



not available. But we can condition on the Bernoulli variables B^ from Theorem 3.1 
or just condition on K(x,y) for all x,y G S, to obtain the conditional density. 
Note that the trace ti s(K) = f s K(x,x) dx = YlkLi^k is almost surely finite. 
Conditional on K, when tr s (K) = n > 0, the ordered n-tuple of events of the 
determinantal projection point process Xs has density 

p(x 1 ,...,x n ) = det[K)(x 1 ,...,x n )/n\, (x 1 ,...,x n ) G S n , 



as verified in (A. 2). Moreover, by Algorithm 1 and Theorem 3.2 

Pn{x) = K(x,x)/n, x G S, 

is the density for an arbitrary selected event of Xs- This is in agreement with the 
simple fact that in the homogeneous case, i.e. when the intensity K(x, x) is constant 
on S, any event of X s is uniformly distributed on S. 

Assume that A& < 1 for all k = 1,2,..., which means that no is almost 



surely one. Then the density of Xs exists and is specified in Theorem 4.1 below 



where the following considerations and notation are used. If P(N{S) — n) > 0, then 



P(N(S) =m) > form = 0,...,n, cf. (3.3). Thus 



P(N(S) = 0) = H(l - X h ) 



is strictly positive, and we can define 



D = -\ogP(N(S) = 0) = -£>g(l - A fc ). (4.1) 



k=l 

Further, define C: Sx S ^Cby 



C(x,y) = J2~^Mx)<Pk(y) (4-2) 



k=\ 



11 



where 



A fc = A fc /(1 - A fc ), fc = l,2 



Let \S\ = Ldx, and set det[C](xi, . . . 



5 -"5 

1 if n = 0. 



Theorem 4.1. Assuming Xk < I, k — 1,2,..., then X$ is absolutely continuous 
with respect to the homogeneous Poisson process on S with unit intensity, and has 
density 

f{{x u . . . , x n }) = exp(|S| - D) det[C]( Xl , ...,x n ) (4.3) 



for all (x\ 



x n ) G S n and n — 0, 1, 



Proof. This was first verified in Macchi (1975). Note that the right hand side in 
(4.3) is not depending on the ordering of the events. Equation (4.3) follows from a 
longer but in principle straightforward calculation, using (3.2), (A.3), and the fact 
that if Y follows the homogeneous Poisson process on S with unit intensity, then 



pW(x 1 ,...,x n )=Ef(Y\j{xi 



See Shirai and Takahashi (2003) and McCullagh and M0ller (2006). 



□ 



Remark 4.2. It is possible to express C and D in terms of C without any direct 
reference to the spectral representations (2.7) and (4.2): Let 

C 1 s (x,y)=C s (x,y), C k s {x, y) = [ C*~\x, z)C s (z, y) dz, x, y e S, k = 2, 3, 



Then 



and 



k=l 



C{x,y) = J2cl(x,y), x,y G S. 



(4.4) 
(4.5) 

(4.6) 



k=l 



Also, as noticed in Macchi (1975), C is the unique solution to the integral equation 



C(x,y)- / C(x,z)C(z,y) dz = C(x,y), x,y G S. 



Section 



5.3 



and Appendix D discuss efficient ways of approximating C and D 
when X is stationary. 



Remark 4.3. The density (4.3) is hereditary in the sense that f({x%, . . . ,x n }) > 
whenever f({xx, . . . , x n+ i}) > 0. This allows us to define the Papangelou conditional 
intensity for all finite point configurations x = {x±, . . . , x n } C S and points u G S\x 
by 

A(«;aj) = f(x U {u})/f(x) = det[C}(x u . . . , x n , u)/det[C](x u ...,x n ) 



(taking 0/0 = 0). Georgii and Yoo (2005) use this to study the link to Gibbs point 
processes, and establish the following result of statistical interest: for any finite 
point configurations x C S and y C S, 



\(u; x) > X(u; y) whenever x C y 
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(4.7) 



and for any point u £ S \ x, 



X(u; x) < C(u,u) 



(4.t 



(Theorem 3.1 in Georgii and Yoo (2005)) 



The monotonicity property (4.7) is once again confirming the repulsiveness of a 
DPP. 



Equation (4.8) means that X$ is locally stable and hence that X$ can be coupled 
with a Poisson process Y$ on S with intensity function given by C(u,u), u £ S, 
such that X s C F 5 (see Kendall and M0ller| (120001) and |M0ller and Waagepetersen 



(2004)). This coupling is such that Xs is obtained by a dependent thinning of 
Ys as detailed in the abovementioned references. By considering a sequence Si C 
S 2 C ... of compact sets such that M. d = U n S n (e.g. a sequence of increasing balls 
whose diameters converge to infinity), and a corresponding sequence of processes 
X n ~ DPP(C; S n ) which are coupled with a Poisson process Y on IR d with intensity 
function given by C(u,u), u £ IR d , such that Xl C X 2 C . . . C Y, we obtain that 
U n X n C K follows DPP(C). In other words, X can be realized as a dependent 
thinning of the Poisson process Y. 

Imposing certain conditions concerning a finite range assumption on an extended 
version of C to M. d and requiring C to be small enough, it is possible to extend 
the Papangelou conditional intensity for Xs to a global Papangelou conditional 
intensity for X and hence to derive the reduced Palm distribution of X (for details, 



see Proposition 3.9 in Georgii and Yoo (2005)). Unfortunately, these conditions are 
rather restrictive, in particular when d > 2. 



5 Stationary models 



Suppose X ~ DPP(C) is stationary, i.e. its distribution is invariant under transla- 
tions, or equivalently, C is of the form 



C(x,y) = C Q (x-y), x,y £ 



(5.1) 



We also refer to Cq as a covariance function. Note that Cq(0), the variance corre- 



sponding to C, equals p, the intensity of X, cf. (2.3). 



In light of Propositions 2.3 2.4 as inhomogeneous DPPs can be obtained by 
transforming or thinning X, stationarity is not a very restrictive assumption. For 



example, by (2.5), if we transform X by a one-to-one continuous differentiable map- 



ping T such that its Jacobian matrix is invertible, then T(X) is a DPP with kernel 

a ra ns(x,y) = {J^ix^CoiT-'ix) - T'^y) ) I J T -x (y) | V 2 . (5.2) 

It is often convenient to require that Cq is isotropic, meaning that Cq{x) = 
P-Ro(ll^ll) is invariant under rotations about the origin in IR d . Then Co is real, and 
the pair correlation function depends only on the distance between pairs of points, 
g(x,y) = go(\\x — y\\), cf. (2.4). Hence commonly used statistical procedures based 



on the pair correlation function or the closely related X-function apply (see Ripley 



(1976, 1977) and M0ller and Waagepetersen (2004)). In particular, using the relation 



\Ro(r) 



1 - 9o(r) 



(5.3) 
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we can introduce the 'range of correlation', i.e. a distance ro > such that 
a/1 — goir) is considered to be negligible for r > ro, as exemplified later in (5.13). 



Isotropy is also a natural simplification, since in the stationary case, any aniso- 
tropic covariance function can be obtained from some isotropic covariance function 



using some rotation followed by some rescaling, see e.g. Goovaerts (1997). Exam- 



ples of stationary isotropic covariance functions are studied in Section 5.2 However 



the following Section 5.1 does not involve an assumption of isotropy, and the ap- 



proximation of Cq studied in Section |5.3| is only approximately isotropic when Co is 
isotropic. 



5.1 A simple spectral condition for existence 



The following Proposition 5. 1| concerns the meaning of (C2) in terms of the spectral 
density for Cq. We start by recalling what the spectral density is. 

For any number p > and Borel set B C R d , let L P (B) be the class of p- 
integrable functions h : B — > C, i.e. J B \h(x)\ p dx < oo. Denote • the usual inner 
product in R d . For any Borel function h : M. d — > C, define the Fourier transform 
?(h) of h by 

F(h)(x) = J h(y)e~ 27Tix - y dy, x E R d , 
provided the integral exists, and the inverse Fourier transform F~ l (h) of h by 



h(y)e 27Tix - y dy, x E 



provided the integral exists. For instance, if h E L 1 (IR d ), then J-(h) and are 
well-defined. 

Recall that L 2 (IR d ) is a Hilbert space with inner product 



(hi,h 2 } = j h 1 (x)h 2 (x)dx 

and the Fourier and inverse Fourier operators initially defined on L x (R d ) n L 2 (R d 
extend by continuity to J= : L 2 (R d ) ->■ L 2 (R d ) and J 7 " 1 : L 2 (R d ) -> L 2 (R d ). Fur- 
thermore, these are unitary operators that preserve the inner product, and J 7-1 



is 



the inverse of J 7 . See e.g. Stein and Weiss (1971). 



By Khinchin's (or Bochner's) theorem, since Cq is a continuous covariance func- 
tion, a spectral distribution function F exists, i.e. F defines a finite measure so 
that 



Co(x) 



^2-irix-y 



dF(y), 



x E 



If F is differentiable, then the derivative tp(x) = dF(x)/dx is called the spectral 
density, and is non-negative, E L 1 (R d ), and C = J r_1 ((^). On the other hand, 
if C E L 1 (R d ) and C is continuous (as assumed in this paper), then the spectral 
density necessarily exists (equivalently F is differentiable), (p = J 7 (Co), and ip is 



continuous and bounded. See e.g. pages 331-332 in Yaglom (1987). 



Alternatively, if C E L 2 (. 
exists, since we can define ip 



) and Co is continuous, the spectral density (p also 
J 7 (Cq) in L 2 (R d ) as explained above. In this case, 
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ip is non- negative, belongs to L 1 (M d ) fl L 2 (R d ), but is not necessarily continuous or 
bounded. Note that if C G L 1 ^), then C G L 2 (R d ) by continuity of C . 



Proposition 5.1. Under (CI) and ( pTTj ), if C G L 2 ( 

< 1. 



i/ien fCU^ equivalent 
(5.4) 



Proof. Consider any compact set S C M d . For fa G L 2 (S), define /is G L 2 (R d ) by 
^s(^) — M^) if x E S and /is(:r) = otherwise. From ( |5.1 ), the integral operator 



T5 associated to C on L (S) (see (2.6)) becomes the convolution operator given by 



T s (h)(x) = C *hs(x) 



C (x - y)h(y) dy, x e S. 



Recall that the spectrum of Ts consists of all A G C such that the operator 
Ts — XIs is not invertible or it is invertible and unbounded (with respect to the 
usual operator norm), where Is denotes the identity operator on L 2 (S). 

Consider the multiplicative operator Q v on L 2 (R d ) associated to ip, i.e. Q (p (h)(x) = 
ip(x)h(x) for h G L 2 (R d ). Its restriction to L 2 (S) is given by Q<p,s(h) = Q<p s {hs) 
for h G L 2 (S). Note that T s {h) = F- x Q 9 F{h s ) for h G L 2 (S). Since the Fourier 
operator is a unitary operator (as J-'J 7 ^ 1 = T~ X J- = I where I denotes the identity 
operator on L 2 (M. d )), the spectrum of T s is equal to the spectrum of Q^g, which 
in turn is equal to ess-im(<£>s) (the essential image of <ps), see (12) in Section 8.4.3 
Birman and Solomjak (1987). In our case, ess-im^^) is the closure of <p(S). 
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Consequently, (C2) is equivalent to (f < 1. 



□ 



Assumption 5.2. Henceforth, in addition to (CI), we assume that Co G L 2 (R d ) 



and that (5.4) holds. 



The following corollary becomes useful in Section 5.4 where we discuss a spectral 
approach for constructing stationary DPPs. 



Corollary 5.3. Under (5.1) the following two statements are equivalent. 



(i) There exists ip G L 1 (R d ) with < ip < 1 and C = J 7 ' 1 ^). 

(li) Conditions (CI) and (C2) hold and C G L 2 (R d ). 

Proof. Assume (i). Then < (p < 1 implies that J \(p(x)\ 2 dx < J \<p(x)\ dx < 00, 
i.e. <p G L 2 (R d ), and so by Parsevals identity C G L 2 (R d ). Further, C = J 7 ' 1 ^) 
with ip G L 1 (IR d ), so Cq is continuous. By Bochner's theorem, the continuity of Co 
and the non-negativity of ip imply that Cq is positive-definite, and so (CI) follows 
from (5.1). Moreover, (C2) holds by Proposition 5.1. Hence (i) implies (ii). 

Conversely, assume (ii). Combining Bochner's theorem and the fact that Cq is 
continuous and Co G L 2 (R d ), we deduce that there exists <p G L x (]R d ) such that 
C = ^{cp) (see also page 104 in |Yaglom| ( [l987[ )). By (CI), we have that cp > 0. 
The fact that ip < 1 follows from Proposition 5.1. Hence (ii) implies (i). □ 
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For later purposes, when considering a parametric model for Co with parameters 
p and 9, notice the following. For each fixed value of 9, < p < p max where 
(6) may depend on 9 and is determined by (5.4). As exemplified in 

Pmax 



Pmax Pmf 

Section 5.2 



will be a decreasing function of the range of correlation (which 
only depends on 9). On the other hand, it may be more natural to determine the 
range of 9 in terms of p and a given maximal range of correlation. Finally, in order 
to work with the density given in Theorem 4.1, we may require that <p < 1. 



5.2 Examples of covariance models 



Numerous examples of stationary isotropic covariance functions exist (see e.g. Gelfand 



et al. 2010), while examples of stationary anisotropic covariance functions are dis- 
cussed in De laco et al. (2003). This section starts by considering the simple ex- 



ample of the circular covariance function and continues with a brief discussion of 
the broad class of stationary isotropic covariance functions obtained by scaling in 
normal-variance mixture distributions, where a few specific examples of such mod- 



els are considered in more detail. Section 5A_ discusses further examples based on a 
spectral approach. 

It may be appealing to construct isotropic covariance functions Cq(x), where the 
range 



5 = sup{||x|| : C (x) 7^ 0} 



(5.5) 



is finite. Examples of such finite range covariance functions are given in Wu| (1995) 
and Gneiting (2002). By Definition 2.1, if A, B C IR d are separated by a distance 



larger than 5, then Xa and X% are independent DPPs. Let d = 2 and consider the 
circular covariance function with finite range 5 > and given by 



C (x) 



2 

p— | arccosl 



,0 



\x\\/6)-\\x\\/5y/l-(\\x\\/5y 



\x\\ < 5. 



Note that tt5 2 Cq(x)/(Ap) is the area of the intersection of two discs, each with 
diameter 5, and with distance ||x|| between the centers. Since this area is equal to 
the autoconvolution of the indicator function of the disc with center at the origin 
and with diameter S, the associated spectral density becomes 

V (x)=p/7r(J 1 ( 7 r5\\x\\)/\\x\\) 2 

where J\ is the Bessel function of the first kind with parameter v = 1. This spectral 
density has maximal value <p(0) = p7n5 2 /4, so by (5.4), a DPP with kernel Cq exists 
if < p < p max , where p max = 4/(n5 2 ). Therefore, 



we require 



p5 2 < 4/tt. 



(5.6) 



In this paper we only use the circular covariance function to understand well the 



quality of our approximations in Section 5.3 



In the sequel we focus on more interesting classes of covariance functions. Let 
Z be a c?-dimensional standard normally distributed random variable, and W be 
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a strictly positive random variable with E(W~ d ^ 2 ) < oo, where Z and W are in- 
dependent. Then Y = \[WZ follows a normal-variance mixture distribution, with 
density 

h(x) = E [W~ d/2 exp (— ||a;|| 2 /(2iy))] /(2n) d ^ 2 , x E R d . 
Note that h(0) = sup h, and define 

C (x)=ph(x)/h{0), xeR d . 

The Fourier transform of Co is 

<p(x) = pE [exp {-2tx 2 \\x\\ 2 W)] /h(0), x e R d 

which is positive, showing that Co is a stationary isotropic covariance function. Note 
that ip is given by the Laplace transform of W. By (5.4), a stationary DPP with 
kernel Co exists if < p < p max , where p is the intensity and 

Pmax = h(0) = E(W- d / 2 )/(2n) d / 2 . 



Gneiting (1997) presents several examples of pairs h and J-'(h) in the one- 



dimensional case d — 1, and these examples can be generalized to the multivariate 
case. Here we restrict attention to the following three examples, where Y follows 
either a multivariate normal distribution or two special cases of the multivariate 
generalized hyperbolic distribution (Barndorff-Nielsen, 1977, 1978). We let T(a,b) 
denote the Gamma-distribution with shape parameter a > and scale parameter 
b > 0. 

First, taking y/2~W = a, where a > is a parameter, we obtain the Gaussian 
(or squared exponential) covariance function 

C (x) = pexp (-||x/a|| 2 ) , xeR d , (5.7) 

and 

(p(x) = p(y/ira) d exp (-\\Trax\\ 2 ) , x eR d . 

Hence 

Pmax = (Vna)~ d (5.8) 

is a decreasing function of a. 

Second, suppose that W ~ T{y + d/2, 2a 2 ) where v > and a > 0. Then 

h( X ) - ii^ir-Mii^ii) Rd 

1 ' 2"+ d -!( ^a) d T(u + d/2) ' ' 
where K v is the modified Bessel function of the second kind (see Appendix C). Hence 

C (x) = p^||x/a|r^(||x/«||), xeR d , (5.9) 

is the Whittle-Matern covariance function, where for v = 1/2, Cq{x) = pexp(—\\x\\/a) 
is the exponential covariance function. Moreover, 

, N T(v + d/2) (2J^a) d . 
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so 



T(u) 



Pn 



T(u + d/2)(2y/^a) d 

is a decreasing function of v as well as of a. 

Third, suppose that 1/W ~ T(u, 2a~ 2 ) where v > and a > 0. Then 



(5.10) 



h(x) 



T{v + d/2) 



T(u)(^a) d (l + \\x/a\\ 2 Y +d/2 '' 
is the density of a multivariate t-distribution, and 



x G 



C (x) 



P 



x G 



(l + \\x/a\\ 2 ) u+d/2 
is the generalized Cauchy covariance function. Furthermore, 



(5.11) 



dnl— v 



y(x) 



p(V^a) d 2 
T(u + d/2) 



so 



\2irax\\ u KJ\\27cax\ 



T(u + d/2) 



x G 



(5.12) 



rmax T(u)(^a) d 

is an increasing function of v and a decreasing function of a. 

For later use, notice that the Gaussian covariance function (5.7) with a = l/y/np 
is the limit of both 



(i) the Whittle-Matern covariance function (5.9) with a = l/y/Anup, and 

(ii) the Cauchy covariance function (5.11) with a = \J v / (up) 
as v — > oo. 



We refer to a DPP model with kernel (5.7), (5.9), or (5.11) as the Gaussian, 
Whittle-Matern, or Cauchy model, respectively. In all three models, a is a scale 
parameter of Cq. For the Whittle-Matern and Cauchy models, v is a shape parameter 
of Co- The isotropic pair correlation functions are 



for the Gaussian model: 9o( r ) 
for the Whittle-Matern model: go (r) 
for the Cauchy model: 9o( r ) 



1 - exp (-2(r/a) 2 ) 
1 - $ x - v (rla) v K, 
1 - [1 + [r/af 



r > 0: 



v yr/a)/T{u)\ 

2u-d 



r > 0; 



r > 0. 



In the sequel, let d = 2. For a given model as above, we choose the range of 
correlation r such that go( r o) = 0.99, whereby the isotropic correlation function 
given by (5.3) has absolute value 0.1. While it is straightforward to determine tq for 



the Gaussian and Cauchy model, ro is not expressible on closed form for the Whittle- 



Matern model, and in this case we use the empirical result of Lindgren et al. (2011). 



The ranges of correlation for the Gaussian, Whittle-Matern and Cauchy models are 
thereby 



log(O.l), r 



aVo.l-V^+i) 



(5.13) 
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v = 0.25, p max = 324 
v = 0.50, p max = 337 
v = 1.00, p max = 329 
v = 2.00, p max = 315 

V = °°, Pmax = 293 




v = 0.50, p max = 232 



= 275 



V=1.00, p max 
v = 2.00, p max = 294 
v = 4.00, p max = 298 
v = ~, Pmax = 293 



0.00 0.01 0.02 0.03 0.04 0.05 0.00 0.01 0.02 0.03 0.04 0.05 

(a) (b) 

Figure 2: Isotropic pair correlation functions for (a) the Whittle-Matern model 
and (b)| the Cauchy model. Each black line corresponds to a different value of 
the shape parameter u, and as a reference the pair correlation function for the 
Gaussian model [y = oo) is shown in gray in both plots. For each model the scale 
parameter a is chosen such that the range of correlation is fixed at r = 0.05, and the 
corresponding value of p max is reported in the legend. The circles show values of the 
approximate isotropic pair correlation function obtained by using the approximation 



C app described in Section |5.3 
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respectively. So r depends linearly on a. Notice, when v is fixed, the upper bound 
Pmax on the intensity decreases as r increases, since p max is proportional to r$ d , cf. 



(5.8), (5.10), and (5.12). There is a similar trade-off between how large the intensity 



and the range of the circular covariance function can be, cf. (5.6) 



Figure [2] shows examples of the isotropic pair correlation functions with a fixed 
range of correlation. In particular the Whittle-Matern models are seen to constitute 
a quite flexible model class with several different shapes of pair correlation functions. 
From the figure it is also evident, that the value of p max is of the same order of 
magnitude for all these models indicating that the range of interaction has a major 
effect on the maximal permissible intensity of the model. 



For Ripley's fsT-function (Ripley 1976, 1977) 



pr 

K(r) =2ir tg (t)dt, r > 
Jo 

we have 

for the Gaussian model: Kir 



(5.14) 



for the Cauchy model: Kir) 



irr 



nr 



"M 1 — p (=£-)) 



2v + d-l \a 



a 



r + a 



while for the Whittle-Matern model the integral in (5.14) has to be evaluated by 
numerical methods. 

Recall that pK(r) is the conditional expectation of the number of further points 
of X in a ball of radius r centered at x given that X has a point at x. For two DPPs 
DPP(Ci) and DPP(C2) with common intensity p and corresponding i^-functions 
K\ and K2, we say that DPP(Ci) exhibits stronger repulsiveness than DPP(C*2) if 
Ki(r) < K 2 (r) for all r > 0. If the corresponding pair correlation functions g\ and 
g 2 are isotropic, i.e. gi{x,y) = g i0 (\\x - y\\), i = 1,2, then 



K-i < K 2 if and only if g w < g 20 - 



(5.15) 



In this sense, within each class of the Gaussian, Whittle-Matern, and Cauchy 
models, when v is fixed, the degree of repulsiveness increases as a increases. How- 
ever, the increased degree of repulsiveness comes at the cost of a decreased maximal 



intensity cf. (5.8), (5.10), and (5.12). For fixed p and u, a has an upper limit Q; max 
given by 

"max = l/v^P' "max = 1/ Ct max = yj vj (lip) (5.16) 

for the Gaussian, Whittle-Matern, and Cauchy models, respectively. Letting a = 
a max , the degree of repulsiveness of both the Whittle-Matern and the Cauchy models 
grows as v grows, and the limit is the Gaussian case, cf. (i)-(ii) above. Moreover, we 
consider the variance stabilizing transformation of the i^-function, L(r) = ^ K(r)/ir 



(Besag, 1977), and Figure |3l shows L(r) —r for seven different models, which clearly 



illustrates the dependence between the degree of repulsiveness and v. 



5.3 Approximation 

This section describes approximations of the distribution, kernel and density of the 
stationary DPP X restricted to a unit box [—1/2, l/2] d . Furthermore, it is explained 
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Whittle-Matern 

Cauchy 

Gaussian 



Figure 3: Plots of L(r) — r vs. r for the Whittle-Matern, Cauchy, and Gaussian 
model with a = a max and p = 100. For the Whittle-Matern and Cauchy models, 
v G {0.5,1,2}. The horizontal line at zero is L(r) — r for a stationary Poisson 
process. 



how X can be approximately simulated on any rectangular set and how the density 
of X on such a set can be approximated. Throughout this section, S = [—1/2, l/2] d , 
S/2 = [-1/4, l/4] d , and 25 = [-1, l] d . 

Approximation of the kernel C 

Consider the orthonormal Fourier basis of L 2 (S) given by 

(j) k ( x ) = e 2nik - x , keZ d , xeS, (5.17) 
where Z denotes the set of integers. For u G S, the Fourier expansion of C (u) is 



where 



k&L d 



oc k = I Co(t)e-™dt. 
's 



Substituting the finite integral in (5.18) by the infinite integral 

cp(k) = J C (t)e- 2wifc -'dt 
leads us to approximate Co on 5" by 

kez d 

So we consider the approximation 

C Q {u) w C apPi0 (M), ue S. 
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(5.18) 



(5.19) 



(5.20) 



Since x — y £ S if x, y £ S/2, this leads to the following approximation of the kernel 

C on 5/2 x S/2 



C(x, y) « C ap p(x, y), x, y G 5/2, 



(5.21) 



where C app (x,y) = C apPi0 (a; - y), x,y E S/2. 



Comparing (5.18) and (5.19) we see that the error of the approximations (5.20) 
and (5.21) is expected to be small if Co(t) ~ for t G WL d \ S. In particular, for 



covariance functions with finite range 5 < 1/2 (see (5.5)), C (t) = for t G K \ S 



and so C («) = C apPi o(w) for u G 5, i.e. the approximations (5.20) and (5.21 ) are then 
exact. For instance, considering the circular covariance function and the existence 



condition (5.6), we have 5 < 1/2 if p > 16 /n, which indeed is not a restrictive 
requirement in practice. 

Appendix [C] studies the accuracy of the approximation Cq(u) ~ C apPi o(u), u G 5, 
for the Whittle-Matern model introduced in Section 15.21 and show that the error is 



small provided the intensity p is not too small. Furthermore, Figure [2] in Section 5J2 
indicates that the approximation is accurate for the examples in the figure as the 
approximate pair correlation functions marked by circles in the plot are very close 
to the true curves. 

For later purpose, we consider the periodic kernel defined on S x S as 

C^(x,y) = Cr(x-y), x,y G 5, 

where Cg er is the periodic extension of Cq from S to 25 defined by 



Cr(u) 
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Evaluating C pcr (x, y) for x,y G 5 corresponds to wrapping [—1/2, l/2] d on a torus 
and evaluating C(x^\y^), where and y^' are the points on the torus corre- 
sponding to x and y. 



Finally, following (5.20), we use the approximation Cg er (-u) rs C%™ (u), u G 25, 
where C^ p Q {u) = J2 keZ d <p(k)e 2nlk ' u , u G 25, which leads to the approximation of 
C peT on 5 x 5 : C pcr (x,y) w C^ r p (x,y) where 



(5.22) 



Note that for x,y G 5/2, C per (x,y) = C(x,y) and C p °i(x,y) = C app (x,y). 



The border method 



Since ip < 1, the DPP X per ~ DPP s (CP p c p is well-defined. We can think of X per as 
a DPP on the torus with a kernel approximately corresponding to C on the torus. 
Furthermore, we can approximate X$/2 ~ DPP(C; 5/2) by = X per fl 5/2 ~ 

DPP 5 (CPpp; 5/2). Thus to approximately simulate X s /2 we need to be able to 



simulate X per , which is straightforward since (5.22) is of the form required for the 



simulation algorithm of Section |3j Recall that for finite range covariance functions 
with 5 < 1/2 (see (5.5)), the simulation is exact (or perfect) as X s /2 and X^ 
identically distributed. 



arc 
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More generally suppose we want to simulate Xr where R C M, d is a rectangular 
set. Then we define the affine transformation T(x) = Ax + b such that T(R) = 5/2. 
Then Y = T(X) is a stationary DPP, with kernel given by (5.2) and spectral density 
(Py{%) — tp{A T x). Let F per be the DPP on 5 with kernel (5.22) where <p is replaced 
by (py Then we simulate Y per and return T^iY^ 1 n 5/2) as an approximate 
simulation of Xr. We refer to this simulation procedure as the border method for 
simulating Xr. 



The periodic method 

From our practical experience it appears that DPP s{C P pp) is also a good approxima- 
tion of DPP(C; 5), which may be harder to understand from a purely mathematical 
point of view. Intuitively, this is due to the fact that the periodic behaviour of 



C^pp o mimics the influence of points outside 5. To illustrate this, Figure ^ 



shows 

the acceptance probability for a uniformly distributed proposal (used for rejection 
sampling when simulating from one of the densities pi, see Remark |3.5) when X per 



is simulated by the algorithm in Section [3] The qualitative behavior of the process 
in 5 and in the interior region 5/2 are similar in the sense that there are regions at 
the border where the acceptance probability is low. For the process on 5/2 this is 
due to the influence of points outside 5/2 whereas for the process on 5 this influence 
is created artificially by points at the opposite border. 




(a) 




Figure 4: (a) Acceptance probability for a uniformly distributed proposal at an 
intermediate step of the simulation algorithm of Section [3] The process being sim- 
ulated is X per on 5 = [—1/2, 1/2] d and the interior box corresponds to the region 
5/2 = [—1/4, l/4] d . The black points represent previously generated points, and the 



acceptance probability is zero at these points, (b) Empirical means and 2.5% and 
97.5% pointwise quantiles of L(r) — r using either the periodic method (gray lines) 
or the border method (black lines), and based on 1000 realizations of a Gaussian 
model with p = 100 and a = 0.05. The dashed line corresponds to the theoretical 
L(r) — r function for this Gaussian model. 
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This approximation gives us an alternative way of approximately simulating 
Xr where R C M. d is a rectangular set as follows. We simply redefine the affine 
transformation above such that T(R) = S. Then Y = T(X) is again a stationary 
DPP with spectral density Py{%) = p(A T x), and T _1 (l^ cr ) is an approximate 
simulation of Xr. We call this the periodic method for simulating Xr. 

The advantage of the periodic method is that we on average only need to generate 
p\R\ points whereas the border method requires us to generate 4p|i?| points on 
average. To increase the efficiency of the border method we could of course use 
a modified affine transformation such that T(R) = S' with S/2 C S' C S, but 
we will not go into the details of this, since it is our experience that the periodic 
method works very well. In particular we have compared the two methods for 
simulating DPPs with kernels given by circular covariance functions. In this case the 
border method involves no approximation and comparison of plots of the empirical 
distribution of various summary statistics revealed almost no difference between the 
two methods (these plots are omitted to save space). 

For a Gaussian covariance function, Figure ^[b) shows empirical means and 



2.5% and 97.5% pointwise quantiles of L(r) — r using either the periodic method 
(gray lines) or the border method (black lines), and based on 1000 realizations of 
a Gaussian model with p = 100 and a = 0.05. The corresponding curves for the 
two methods are in close agreement, which suggests that the two methods generate 
realizations of nearly the same DPPs. This was also concluded when considering 
other covariance functions and functional summary statistics (plots not shown here). 
In Figure ^[bj the empirical means of L(r) — r are close to the theoretical L(r) — r 
function for the Gaussian model, indicating that the two approximations of the 
Gaussian model are appropriate. 

The computational efficiency of the periodic method makes it our preferred 
method of simulation. The 1000 realizations used in Figure 4 \b)\ were generated 



in approximately three minutes on a laptop with a dual core processor. 
Approximation of the density / 

First, consider the density / for X$ as specified in Theorem 4.1 (so we assume 
ip < 1). We use the approximation / fa / per , where / per denotes the density of X per . 
Letting 

<p(u)=<p(u)/(l-<p(u)), ueS, (5.23) 

Ci;;(x,y) = C^ (x-y) = J>(A;)e 27ri ^), x,yeS, (5.24) 

kez d 

and 

!>&=5>g(l + ¥KA0) (5-25) 

we have 

r*({ Xl , x n }) = exp(|S| - D%) det[C^]( Xl , ...,x n ), {x u . . . , x n } C S. 

(5.26) 

This density can be approximated in practice by truncating the infinite sums defining 
Cfpp and -D P pp. Furthermore, the speed of calculation can be increased by using a 
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fast Fourier transform (FFT) to evaluate C'appO- The details of the truncation and 



use of FFT are given in Section 6.1 



Second, consider the density of Xr, where R C M. is rectangular. Then we use 
the affine transformation from above with T(R) = S to define Y = T(X). If /y er 



denotes the approximate density of Y as specified by the right hand side of (5.26), 
we can approximate the density of Xr by 

/ per ({x!, . . . , x n }) = \R\~ n exp(\R\ - ISDf^iTdx!, . . . , x n })), {x 1} . . . ,x n } C R. 

We call / per the periodic approximation of /. The simulation study in Section [6] 
shows that likelihood inference based on / pcr works well in practice for the examples 
in this paper. Appendix [D] introduces a convolution approximation of the density 
which in some cases may be computationally faster to evaluate. However, as dis- 



cussed in Appendix D, this approximation appears to be poor in some situations 



and in general we prefer the periodic approximation. 

Remark 5.4. It would have been desirable if we could compute explicitly the spec- 



tral representation (2.7) for a given parametric family of the covariance function Co 
and for at least some cases of compact sets S (e.g. closed rectangles). Unfortunately, 
analytic expressions for such representations are only known in a few simple cases 



(see for instance Macchi (1975)), which we believe are insufficient to describe the 
interaction structure in real spatial point process datasets. Numerical approxima- 
tions of the eigenfunctions and eigenvalues can be obtained for a given covariance 
function Co. However, in the simulation algorithm we may need to evaluate the 
eigenfunctions at several different locations to generate each point of the simula- 
tion, and the need for numerical approximation at each step can be computationally 



costly. On the other hand, the Fourier approximation (5.22) is very easy to apply. 



This requires that the spectral density associated to Co is available, which is the 



case for the examples given in Section 5.2 



5.4 Spectral approach 

As an alternative of specifying a stationary covariance function Co, involving the 
need for checking positive semi-definiteness, we may simply specify an integrable 



function ip : R — > [0, 1], which becomes the spectral density, cf. Corollary 5.3 In 



fact knowledge about ip is all we need for the approximate simulation procedure 



and density approximation in Section 5.3. However, the disadvantage is that it may 



then be difficult to determine Co = J 7-1 (ip), and hence closed form expressions for 
g and K may not be available. Furthermore, it may be more difficult to interpret 
parameters in the spectral domain. 

Quantifying repulsiveness 



In Section |5\2 we said that DPP(Ci) exhibits stronger repulsiveness than DPP(C2) if 
their intensities agree and their corresponding .ff -functions satisfy Ki < K 2 . Consid- 
ering Figure [3j we cannot always use this concept when comparing a Whittle-Matern 
model with a Cauchy model. 
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Instead, for any stationary point process defined on R d , with distribution P, 
constant intensity p > 0, and pair correlation function g(x,y) = g(x — y) (with a 
slight abuse of notation), we suggest 



p = p 



J t 1 - 9{ x )\ dx 



as a rough measure for repulsiveness provided the integral exists. Denote o the origin 
of R and note that the function x — >■ pg(o,x) = pg(x) is the intensity function for 
the reduced Palm distribution P], cf. ( |2.11| . Therefore, /i is the limit as r — » oo of 
the difference between the expected number of events within distance r from o under 
respectively P and P]. For a stationary Poisson process, p = 0. For any stationary 



point process, we always have p < 1 (see e.g. (2.5) in Kuna et al. (2007)). When 
g < 1 (as in the case of a DPP), we clearly have p > 0, so that < p < 1. 
Especially, for a stationary DPP, 



M = P j[l-g(x)}dx= l - j \C (x)\ 2 dx = - j 



|<^(a;)| 2 dx 



where the second equality follows from (2.3) and (2.4), and the last equality follows 



from Parseval's identity. Using an obvious notation, we say that DPP(Ci) is more 
repulsive than DPP(C 2 ) if p\ = P2 and p\ > p 2 . In the isotropic case, this is in 
agreement with our former concept: if pi = p 2 , then K\ < K 2 implies that p± > p 2 , 



cf. (5.15). 



Suppose we are interested in a stationary DPP with intensity p and a maximal 
value of p. Since < f{x) 2 < <p{x) < 1, we have p = 1 if and only if j <p(x) 2 dx = 
J ip(x) dx = p. So p is maximal if tp is an indicator function with support on a Borel 
subset of lR d of volume p. An obvious choice is 

I 1 if \\x\\ < r , 

<p(x) = \ 11 11 " (5.27) 
I otherwise 

where r d = pdT(d/2) j l {2 r K d l 2 ). For d — 1, Cq is then proportional to a sine function: 

Co(x) = sm(7ipx)/(irx) if d= 1. (5.28) 
For d = 2, Cq is then proportional to a 'jinc-like' function: 

C (x) = y/pJ 1 (2 y /¥p\\x\\)/\\x\\ if (2 = 2. (5.29) 

A general class of spectral densities 

In the following we first describe a general method for constructing isotropic models 
via the spectral approach. Second, this method is used to construct a model class 
displaying a higher degree of repulsiveness than the Gaussian model which appears 



as a special case. In particular, as shown after (5.36) below, the extreme case (5.27) 
is a limiting case of this class. 

Let / : [0, oo) —> [0, oo) be any Borel function such that sup / < oo and < c < 
oo, where 

d7T d / 2 



oo 

d-l 



c= j f(\\x\\)dx = T ——- j r d -V(r)dr. (5.30) 
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Then we can define the spectral density of a stationary isotropic DPP model as 



<p(x) = pf(\\x\\)/c, xe 



(5.31) 



where p is the intensity parameter. The model is well-defined whenever 

P < Pmax = C/ SUp /. (5.32) 

Below we give an example of a parametric model class for such functions /, where 



the integral in (5.30) and the supremum in (5.32) can be evaluated analytically. 

Assume Y ~ r(7,/3) and let / denote the density of V 1 /", where 7 > 0, (3 > 0, 
and v > are parameters. Let a = (3~ l l u , then by (5.30) and (5.31), 



dn d / 2 T(>y + 



d+V 



and 



We have p T 



T(d/2 + i)r( 7 ) 

V(d/2 + l)vat 



a 



l-d 



V > 



ax\^ v *exp( 



ax 



if •yu < 1, and 

c c/vr d / 2 a- d r(7 + exp( 7 - 1/z/) 



(5.33) 



/(( 7 -l/i/)i/") 



r(rf/2 + 1)^(7 - 1/z/)^- 1 /^ 



if 7 i/>l. (5.34) 



We call a DPP model with a spectral density of the form (5.33 ) a generalized gamma 



model. For 71/ > 1, the spectral density (5.33) attains its maximum at a non-zero 



value, which makes it fundamentally different from the other models considered in 
this paper where the maximum is attained at zero. 

In the remainder of this section, we consider the special case 7 = 1/z/, so 



<fi(x) = P 



T(d/2 + l)va c 
dn d / 2 T(d/v) 



exp(— ||ax|| 1/ ). 



(5.35) 



We call a DPP model with a spectral density of the form (5.35) a power exponential 
spectral model. For v = 2, this is the Gaussian model of Section 5.2 . 

For the power exponential spectral model, a max is given in terms of p and v by 
= Y(d/v + l)r~ d , where r is defined in (5.27). For the choice a = a max in 



a 



(5.35), the spectral density of the power exponential spectral model becomes 

(p(x) = exp{-\\T{d/v + l) 1/d x/r\\ u ). 



(5.36) 



This function tends to the indicator function (5.27) as v tends to 00. Thus the power 



exponential spectral model contains a 'most repulsive possible stationary DPP' as 
a limiting case. 

Figure [5] illustrates some properties of the power exponential spectral model 
when a = a max and v = 1, 2, 3, 5, 10, 00. Recall that v = 2 is the Gaussian model. 

Figure E pt) shows the spectral densities for these models, and it clearly illustrates 
how the spectral density approaches an indicator function as v — > 00. 
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Figure 5: 
functions 



and 

models with p 



Isotropic spectral densities, (b) approximate isotropic pair correlation 
approximate L(r) — r functions for power exponential spectral 
100, v = l,2,3,5,10,oo and a = a max the maximal permissible 



value determined by (5.34). 



Figure £|[b) shows the pair correlation functions g. Since we are not aware of a 
close form expression for Cq = J r_1 (<^) when ip is given by (5.36), we approximate 
Co by the periodic method, leading to approximating pair correlation functions in 
Figure 5|[b) The figure shows that the repulsiveness of the process increases as v 
increases. Notice the slightly oscillating nature of g for large values of u, which 
at first may appear to be an artifact of models (5.28) and (5.29), but in fact such 
behaviour is expected for very repulsive processes (consider e.g. a stationary point 
process which is so repulsive that all of its realisations are uniform translations of a 
rectangular lattice, then g is periodic). 

Figure |q^c) shows the corresponding approximations of L(r) —r (analogously to 
Figure |3j in Section 5.2). The figure confirms once again that the repulsiveness of 



the process increases as v increases. 



6 Inference 



In this section, we discuss how to estimate parameters of stationary DPP models 



and to a certain extent how to do model comparison and model checking. Section 6A_ 
focuses on maximum likelihood based inference, while Section 16.21 discusses alter- 



native ways of performing inference. In Section [6~3 the approaches of Sections 6A 



and 6.2 are compared in a simulation study. Finally, in Section 6.4, a parametric 
determinantal point process model is fitted to a real data set. 

We assume that X is a stationary DPP and that the kernel C is given by one 
of the parametric covariance models described in Sections 5^2 and 5^4 . Further, we 
let x = {x\, . . . , x n } denote a realization of X$, where S is a bounded rectangular 
region and we refer to x as the data. 

The covariance function is assumed to be parametrized by the intensity p and an 
additional parameter 9 for the corresponding correlation function. Irrespective of the 
estimation procedure used, we always estimate p using the unbiased non-parametric 
estimate p = n/\S\. This is a computationally simple estimate and it reduces 
the parameter dimension for the subsequent estimation procedure. The estimate p 
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introduces a bound on the parameter space, since the remaining parameters have to 
satisfy the restriction p max (6 l ) > p, cf. Section 5.2 



6.1 Maximum likelihood based inference 

In order to perform maximum likelihood based inference, we approximate the like- 



lihood function with respect to 9 by the density / per of Section 5.3 where we use 
truncation and FFT to evaluate / per . This is described in more detail in the follow- 
ing. 



and (5.25) by 



Let Z N = {-N, -N + 1, . . . , N - 1, N} and define truncated versions of fl5.24fl 

D N = J2 Ml + 



and 



C N (u) = ¥(k)e 27Tik - u , u G 



(6.1) 



(6.2) 



Note that D N and CV depend only on 9 through (p given by (5.23). For a given N 



(the choice of N is discussed below), the approximate maximum likelihood estimate 
(MLE) is the value of 9 which maximizes the approximate log-likelihood 



■N 



(9) = logdet[C , 7V ](x 1 



- D 



N 



where [CVK#i, • • • ,x n ) is the n x n matrix with (i, j)'th element Cn{xi — Xj). If 9 
is one dimensional, the maximum of £n{9) can be determined by a simple search 



algorithm, otherwise the simplex algorithm by Nelder and Mead ( 1965 ) can be used. 
Note that these methods do not require explicit knowledge of the derivatives of £n(9). 



While it is feasible to evaluate (6.1) for large values of N, the evaluation of ( |6.2 ) 
is more problematic since it needs to be carried out for every pair of points in x. For 



moderate N (few hundreds) direct calculation of (6.2) can be used, but for large N 
(hundreds or thousands) we use the FFT of (p. The FFT yields values of Cat at a 
discrete grid of values and we simply approximate C^{xi — Xj) by the value at the 
closest grid point. 

Concerning the choice of N, note that the sum 



S 



N 



tends to p from below as N tends to infinity. Hence, for any value of 9, one criterion 
for choosing iV may be to require e.g. Sn > 0.99p. However, this may be insufficient 
as iV also determines the grid resolution when FFT is used, and a high resolution 
may be required to obtain a good approximation of the likelihood. Therefore, we 
use increasing values of N until the approximate MLE stabilizes. 

When comparing several different models fitted to the same dataset (e.g. Gauss, 
Whittle-Matern, and Cauchy), we prefer the model with the largest value of £n(9). 
The comparison of £n{9) between different model classes is valid, since the domi- 
nating measure is the same for all the models. 
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6.2 Alternative approaches for inference 



Given a parametric DPP model there are several feasible approaches for inference 
which are not based on maximum likelihood. For example, parameter estimation can 
be based on composite likelihood, Palm likelihood, generalized estimating equations, 



or minimum contrast methods. See M0ller and Waagepetersen (2007), Prokesova 



and Jensen (2010), and the references therein. Here we only briefly recall how the 
minimum contrast estimate (MCE) ( |Diggle and Gratton 1984) is calculated. 

Given a value of 9, let s(r; 9), r > 0, denote a functional summary statistic for 
which we have a closed form expression. In our examples this will be either the 
pair correlation function g or the f^-function. Further, let s(r) be a non-parametric 
estimate of s based on the data x. The MCE based on the functional summary 
statistic s is the value of 9 which minimizes 



D{9) 



sir 



sir: 



\<1\P 



dr 



where the limits of integration 77 < r u and the exponents p > and q > are 



user-specified parameters. Following the recommendations in Diggle (2003), we let 
q — 1/2, p — 2, and r u be one quarter of the minimal side length of S. It is 
customary to use 77 = and we do this when the MCE is based on the i^-function. 
However, when the MCE is based on g, we let r\ be one percent of the minimal side 
length of 5", since in our simulation experiments it turned out to be a better choice. 
To minimize D{9) we use the same method as was used for maximizing £n{9) in 



Section 6.1 , which avoids the use of derivatives of D(9). 

Finally, when several different models are fitted to the same dataset, the one 
with minimal value of D(9) is preferred. 



6.3 Simulation study 

We have generated 500 realizations in the unit square of the following five models: 
Gaussian, Whittle-Matern with v = 0.5, Whittle-Matern with v — 1, Cauchy with 
v = 0.5, and Cauchy with v — 1. For all models, p = 200 and a = a max /2, where 



Q!max is given by (5.16). In our experience it is difficult to identify the parameters v 



and a simultaneously, which is a well-known issue for the Whittle-Matern covariance 



function (see e.g. Lindgren et al. 2011). Here we consider v known such that the 
remaining parameter to estimate is one dimensional, i.e. 9 = a. 

Table [T] provides the empirical means and standard deviations of the MCE based 
on K, the MCE based on g, and the MLE, where for each model, the MLE is 
calculated for several different values of N. In general, we see that as long as the 
truncation is sufficiently large the MLE outperforms the MCE since the former has 
smaller biases and smaller standard deviations. 

The quality of the likelihood approximation is closely related to the decay rate 
of the spectral density of the model, or equivalently to the rate of convergence of 
Sn- Figure [6] shows Sn for different values of iV for each of the five models. It is 
clear that the two Whittle-Matern models approach the theoretical limit p = 200 at 
a slower rate than the other models, and this makes the likelihood approximation 
inaccurate for small N leading to bias in the estimates shown in Table [Tj 
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Table 1: Empirical means and standard deviations (in parentheses) of parameter 
estimates based on 500 simulated datasets for each of 5 different models with in- 
tensity p = 200. Model 1: Gauss; Model 2: Whittle-Matern (u = 0.5); Model 3: 
Whittle-Matern (v = 1); Model 4: Cauchy (v = 0.5); Model 5: Cauchy (y = 1). The 
columns from left to right are: The true value of a, MCE based on the i^-function, 
MCE based on g, MLE with N = 256, MLE with N = 512, MLE with N = 1024, 
and MLE with N = 2048. All entries are multiplied by 100 to make the table more 
compact. 

a K g MLE256 MLE512 MLE1024 MLE2048 



1 2.00 

2 1.40 

3 1.00 

4 1.40 

5 2.00 



2.05 (0.58) 
1.59 (0.88) 
1.02 (0.46) 
1.48 (0.68) 
2.07 (0.83) 



1.99 (0.51) 
1.48 (0.92) 
0.95 (0.54) 
1.30 (0.87) 
1.91 (0.97) 



1.42 (0.25) 
1.77 (0.11) 
0.97 (0.18) 
1.39 (0.23) 
1.69 (0.29) 



2.01 (0.43) 
1.62 (0.56) 

1.00 (0.36) 
1.37 (0.54) 

2.01 (0.61) 



2.01 (0.43) 
1.55 (0.63) 

1.00 (0.37) 
1.38 (0.54) 

2.01 (0.62) 



2.01 (0.43) 
1.52 (0.67) 
1.00 (0.37) 
1.38 (0.55) 

2.02 (0.61) 




Gauss 

Whittle-Matern (v = 0.5) 

--- Whittle-Matern (v = 1) 

Cauchy (v = 0.5) 

Cauchy (v = 1 ) 



50 100 200 500 1000 2000 

Figure 6: Sn as a function of N. 



6.4 Real data example 

Figure [7] shows a plot of the Norwegian spruces dataset available in spat st at. In the 
following we fit a DPP model to this dataset. The aim is not to conduct a detailed 
analysis of this specific dataset, but rather to illustrate that it is practically feasible 
to fit a DPP model to a real dataset. This dataset has previously been modelled by 



a five parameter multiscale process in M0ller and Waagepetersen (2004), using elab- 



orate Markov chain Monte Carlo MLE methods. Below we fit a more parsimonious 
DPP model by using the much simpler and faster methods from Section [6j 

The non-parametric estimate of the intensity is p = 0.063. First, we fit a Gaus- 
sian model to the data. The estimate of p implies a max — 2.248. Both the MLE 
and the MCE of a is a = a max (the MCE based on the f^-function and the MCE 
based on g are identical in this case). Second, we fit a Whittle-Matern model and 
a Cauchy model to the data. For these models all the estimation procedures yield 
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Figure 7: Locations of 134 pine trees in a 56 m by 38 m region. 



estimates 9 = (a, v) with large values of v (not reported here), and the models are 
practically indistinguishable from the fitted Gaussian model, which is the limiting 
model as v — > oo, cf. Section 5.2 The fitted Gaussian model has both the smallest 
value of D{6) and the largest value of £n(9), which leads us to prefer the Gaus- 
sian model. Figure [8] is used to assess the goodness of fit for the Gaussian model. 
It shows non-parametric estimates of L(r) — r, the nearest neighbour distribution 
function G(r), the empty space function F(r), and J(r) = (1 — G(r))/(1 — F(r)), 
together with 2.5% 97.5% pointwise quantiles (gray lines) for these summary statis- 
tics based on 4000 simulations of the fitted Gaussian model (for definitions of F and 
G, see e.g. M0ller and Waagepetersen (2004)). Figure [8] indicates a lack of fit of the 
Gaussian model applied to the Norwegian spruces, and a more repulsive model may 
be appropriate. 

We therefore fit the power exponential spectral model of Section 5.4, where 
we have truncated the parameter space for v to < v < 10, since models with 
larger values of v are almost indistinguishable from the model with v = 10. The 
approximate MLE is 9 = (a, z>) = (6.36, 10). As a is close to the maximal permissible 
value a max = 6.77 for v = 10, the fitted model is close to a 'most repulsive possible 
stationary DPP', that is, the jinc-like function (5.29). This model is judged to be a 



good fit based on Figure [8] where the simulation based quantiles (black lines) cover 
the non-parametric estimates based on the dataset for all the summary statistics. 



Acknowledgments 

We are grateful to Philippe Carmona, Morten Nielsen and Rasmus Waagepetersen 
for helpful comments. Supported by the Danish Natural Science Research Council, 
grant 09-072331, "Point process modelling and statistical inference", and by the 
Centre for Stochastic Geometry and Advanced Bioimaging, funded by a grant from 
the Villum Foundation. 



32 



i i i i i n i i i i i i r 

2 4 6 8 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 




n 1 1 1 1 1 n 1 1 1 1 r 

0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0 2.5 



Figure 8: Clockwise from top left: Non-parametric estimate of L(r) —r, G(r), F(r), 
and J(r) for the Norwegian spruces dataset. All four plots contain simulation based 
2.5% and 97.5% pointwise quantiles for both the Gaussian model and the power 
exponentialspectral model fitted via MLE. For both models the quantiles are based 
on 4000 realizations. 
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Appendices 



A Proof of Theorem 13.2 



Let the situation be as in Theorem |3.2[ In the sequel, for ease of presentation, we 
ignore null sets. 

We start by proving by induction that for i = n, . . . , 1, (3.5) is a probability 
density and v(X n ), . . . ,v(Xi) are linearly independent (considering complex scalars). 
For i = n and x G S, we have p n (x) = \\v(x)\\ 2 /n > for all x G S, and 

/ p n (x)dx = - [ \\v(x)\\ 2 dx = - I V* \<p k (x)\ 2 dx = 1. 
Js n Js n ^st^i 

Hence p n is a probability density. Clearly, p n (x) = whenever v(x) = 0, so as X n 
is generated from p n , v(X n ) ^ (almost surely). Thus the induction hypothesis is 
verified for i = n. 

Suppose 1 < i < n. By the induction hypothesis, Hi as defined by (3.6) has 
dimension n — i. Let Pj be the matrix of the orthogonal projection from C n onto 
H±. By (gj)), for all x G S, Pi(x) = \\Piv(x) || 2 /i > and 

Pi(x) = whenever v(x) G H^. (A.l) 

By the spectral theorem, Pi = UAiU*, where U is unitary and Aj is diagonal with 
the first i diagonal elements equal to one and the rest zero. Let u k j denote the 
(k, j)'th entry of U . Then 

pAx) = -v(x)*UAiU*v(x) = -\\AiU*v(x)\\ 2 
i i 

where the j'th entry of AiU*v(x) is Yl k =i Ukj^'ki.x) if j < i, and otherwise, so 
Pi(x)dx = - / u kj4>k(x)uij(f)i(x) dx 

^ S 1 -* S j=l k=l 1=1 

= ~^2^2^2 u kjUij cf) k {x)(t)i{x) dx = - ^2^2 l M ^'| 2 / \^k{x)\ 2 dx = 1. 



j=l k=l l=l ° ° j=l k=l 



Thus pi is a probability density. Finally, it follows immediately from (A.l) and the 
induction hypothesis that v(X n ), . . . , v(Xi + i), v(Xi) are linearly independent with 
probability one. 

Hence, the induction hypothesis is verified for all % — n, . . . , 1. 

Now, for iteration i < n, write Pi = Pi(X n , . . . , X i+ i) and H^ = H--(X n , . . . , X i+ i) 
to emphasize the dependence on the previously generated variables. For i = n, set 
P(X n , . . . , X i+l ) = I n and H±(X n , . . . , X l+1 ) = C n . Let 

O = {(xi, . . . , x n ) G 5*" : «(a;i), . . . , v(x n ) are linearly independent} 
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be the support of {X\, . . . ,X n ). Since Pi(x) = \\Piv(x)\\ 2 /i, (X\, . . . ,X n ) has density 

1 - 

p(x 1 ,...,x n ) = — TT \\Pi(x n ,. . . ,x i+1 )v(xi)\\ 2 , (xi,...,x n ) G Q. 
n\ - LJ - 



i=i 



This product is exactly the square of the volume of the parallelepiped determined 
by the vectors v{x\), . . . , v(x n ), which is equal to the determinant of the nx n 
Gram matrix with (i, j) 'th entry v(xi)*v(xj), which in turn is equal to the matrix 
[K](xi, . . .,x n ). Thus, for (x x , . . . , x n ) G Q, 



p(xi 



, ... , x n 



n ! 



det[K](x 



!)•••) X n 



(A.2) 



Moreover, if ( ) G S n \ fi, det[JC] (xi, . . . , x n ) = | det[v(xi) . . . v(x n )} | 2 = 0. 

Hence (A.2) is valid for all (x%, . . . , x n ) G S n . 

Viewed as a point process {X\, . . . , X n }, the number of points is fixed and equal 
to n, and hence by definition of p( n \ 



p {n \x!, ...,x n )= n\p(xi, ...,x n ) = det[K](xi, . . .,x n ), 
This completes the proof of Theorem |3.2[ 



(x u ...,x n )eS n . (A.3) 



B Close upper bounds on the conditional distri- 
butions of Algorithm 1 



In Remark |3.5| we discussed rejection sampling from the densities pi, i = n, . . . , 1, 
using uniform instrumental distributions. For intensive simulations purposes, for 
each i, it is desirable to construct an unnormalized instrumental density which is 
larger than and close to pi as well as easy to simulate from. 

To find such an unnormalized density, we first notice the following. It follows 



from Remark 3.4 that ipi(x) is the norm of a vector obtained after n — i successive 



orthogonal projections of v(x). These projections commute, so that ipi(x) is lower 



than the norm of any projection of v(x) of a lower order. By (3.7) and (3.8), if 
i + 1 < k < n, then 



Pi{x) < 



v(x k )v(x k y 

In TTj f" I Vl.V) 

K(x k ,x k ) 



and so by (3.4) 



i w 1 • ( is( \ \ K i. x ^k)\ 2 
Pi(x) < - mm K[x,x) — — 

l i+l<k<n \ K{X k} X k ) 



% < n. 



(b.i; 



Here the right hand side is an unnormalized density, since it is a continuous function 
of x G S where S is compact. 



The proof of the following lemma uses (B.l ) to derive an explicit upper bound in 
the specific setting of Sectional i.e. in the stationary case, when S = [—1/2, l/2] d , 
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and when the eigenfunctions are Fourier basis functions as in (5.17). Let x = 
(x(l), . . . , x(d)) G R d and y = (2/(1), . . . , y(d)) G R d , and suppose that 

{0i, . . . , <f) n } = {Vh,-jd '■ h e M n i)> ■ ■ ■ Jd e Jd{n d )} 

where <fj u ...,j d (x) = exp (^2ni Ylt=i3k x (k)J and for g = 1, . . . , d, J q {n q ) denotes some 

finite subset of Z with n q elements, such that n = Ylq=i n q- Then the projection 
kernel (3.4) becomes 



9=1 jq&Jq{n q ) 



(B.2) 



Moreover, for any r G N, denote S q (r) = Y^j q ejj n anc ^ ^ or an y numrj er a, define 
a + = max(a, 0). 

Lemma B.l. Let K be the projection kernel ( B.2[ ). For step i — n — 1, . . . , 1 of 
Algorithm^ given the n — i previous points x^ = (xfc(l), . . . ,Xk(d)), k = i + 1, . . . ,n, 
we have 

pt(x) < 7 ^ - , + ?< a , x < n n ( X - ^ \ X M - *k{q)\^n q S q {2) J • (B.3) 

Proof. For ijgR, let K q (x,y) = £V gJ / , e 27 ™-^' - ^. An analytic expansion of 



y)| 2 leads to 



l-^ g (av, 



5>l)'(s - y)^{2^ £ n( ^ , ^(2p - 1)^(1). 



Note that 



f— 1V I 

g«(^ S '( 2p -" S "(')=(2^ E W 



i) 2p > 0. 



(«,i)eJ|(n g ) 



Therefore, the function x — >■ |i£g(x, j/)| 2 can be expanded into an alternate series. In 
particular, for any x, y G R, since S q (0) = n q , 

\K q (x, y)\ 2 >n\- ^\x - yf{n q S q {2) - S 2 q (l)). 

This lower bound is a concave function of \x — y\ when 

\x — y\ < 



2irJn q S q (2)-S*(l) 



and so 



\K q (x,y)\ 2 \K q (x, : 



K q (y,y) 



> (n q -2Tr\x-y\^n q S q (2)-S 2 q (l) 



Combining this with (B.l), we obtain (B.3). 



□ 
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The upper bound in (B.3) provides an unnormalized instrumental density close 
to pi. When d — 1, this instrumental density is a stepwise linear function, and 
hence it is very easy to make simulations under the instrumental density. When 
d=2, the instrumental density provided by (B.3) is a stepwise polynomial function. 
One strategy is then to provide a further upper-bound making rejection sampling 
feasible. In our experience this is not so hard for the DPP models we have considered, 
but since it depends much on the points . . . ,x n and the particular model, it 
seems not easy to state a general result. 



C Fourier approximation of the Whittle-Matern 
covariance function 



Consider the Whittle-Matern covariance function Cq given by (5.9). This appendix 



begins with some preliminary results on K v (the Bessel function of the second kind) 



which appears in (5.9). Next the quality of the Fourier approximation of Cq given 



in Section 15.31 is discussed. 



There are several equivalent ways to define K v . By Equation 8.432 in Gradshteyn 



and Ryzhik| fl2007| ), for all x > and all v > 0, 

KM 



71 



2 



(t 2 - l)"-adt. 



(C.l) 



As x -> 0, then x v KM -> 2"- 1 T(v). Hence by (5.9), C„(0) = p 



The following lemma provides an upper bound and gives an idea of the decay 
rate for K v . The inequality reduces to an equality for v — 1/2. Moreover, according 
to various plots (omitted in this article), the bound seems sharp when v > 1/2. We 
denote 7 = T(l + 2v)~ 1 > '. 



Lemma C.l. For all x > 0, 

KM < 2 u - 1 T(u)x-" (1 - (1 - e-" tx ) 2u ) %jv> 1/2 

and 

KM < Ki/M = v^je" 1 ifv< 1/2. 



(C.2) 
(C.3) 



Proof. When v > 1/2, from flCTTj ) 

KM < 



e -xt t 2u-l df 



— t-x u T(2u,x) 



where T(2z/, •) denotes the incomplete Gamma function with parameter 2v: 



F(2v,x) 



t^M'dt. 



From Alzer (1997) we deduce that 

T{2u,x) < (l- (l-e-^) 2i/ )r(l + 2z/)/(2 
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whenever x > 0, v > 1/2, and < 7 < T(l + 2u) 1 / 2u . Hence (C.2) follows by using 



the relations T{2v + 1) = 2vY(2v) and T(u)T(u + 1/2) = 2 1 - 2u ^T(2v). 

When v < 1/2, using ( |C.1| ) and the fact that t 2 - 1 > 2t - 2 when £ > 1, we 
obtain 



KJx) < 



Finally, making the change of variables u = x(t — 1), we obtain (C.3). 



□ 



For the Whittle-Matern model, the following Proposition C.2 provides an error 



bound for the approximation (5.20) of C (u) by C apP) o("u) when u G [—1/2, l/2] d . 
We let 

(3= LVd (r(l + 2z/) 1 / 2l/ Vl)) 



c(p, v, a, d) 



(Aa) 1 - 2v p 2 Tid/Y{u) 2 if v<\ 
4v 2 p 2 d if v>\ 



, e^ 2e-^ /e"' 3 I 
e(u, a, 1) = -77- + - I -77- + 



/? 1 - e-/ 3 V Z 3 1 - e_/3 



and for d > 2, 



e(u, a,d) = e ^ ( — + 



/3 (l-e-/ 3 ) 2 



+ 



2e"^ 



(3 l-e-P \(3 1 - e 



+ 



d-l 



(C.4) 
(C.5) 

. (C.6) 



Proposition C.2. Let Co be the Whittle-Matern covariance function given by (5.9) 
and let C apPi o be the approximation (5.20) of Co on [— 1/2, l/2] d . If < p < p max 
where p max is gwen fry (5.10), then 



1/2,1/2]° 



|C (x) - C apPi o(x)| 2 da; < c(p, v,a,d)e(v,a,d). 



(C.7) 



Proof. We have 



|C (x) - C apPi0 (x)| dx = 

l/2,l/2] d </[-l/2,l/2] d 



J> fc -^))e 2 -* 



dx 



with 



-l/2,l/2] d 



C (y)e- 



-2irik-y 



dy. 



Defining h(y) = C (y)(l - H-[-i/ 2 ,i/2] d (l/))> we have tp(k) - a k = jF{h)(k) and 

£ (a* - v^)) 2 = £ (^) W) 2 = £ ?( h * W*)- 

fcez d fcez d fcez d 
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The Poisson summation formula on a lattice (see Stein and Weiss (1971), Chap- 
ter VII, Corollary 2.6) gives 

H h *h)(k) = J2h* h(k). 



When v > 1/2, we have 1 — (1 — e ~i x ) 2u < 2ue yx for all x > 0, so from (5.9) and 

h*h{x)= [ C (y)C (x - y) dy < 4pV / e -Z(Uv\\+\\x-v\\) x e R * ( C .8) 

lino denotes the 



v 



v 



where V = V{x) = {y G R d : \\x - > 1/2, Wy]^ > 1/2} and 
uniform norm. 



Suppose that u > 1/2. When d = 1, the latest integral in (C.8) can be computed 
easily to get 



/ 

J\v\>i 



>h,\*-v\>h 



-Z(\v\+\*-y\) dy = e -*M ( %-i + |x| - 1 



7 



if |x| > 1, and the value of the integral at x = is -e « . Thereby, when d = 1, 



^ (a* - <p(A;)) 2 = h * M*0 < 4 P 



V 



a 



7 



fe=i 



7 



e <* + /c — 1 



and (C.7), which involves the terms (C.4) (for v > 1/2) and (C.5), follows from the 
expansion 

oo / 1 \ 

y^(q + k)q k = - q + J for any a e R and \q\ < 1. 

When d > 2, the integral in (C.8) is more difficult to compute and we therefore 
establish an upper bound as follows. Since \\y\\ > (\yi\ + ■ ■ ■ + \yd\)/\fd, 

h*h(x)<4p 2 v 2 [ Y[e-^ {M+lxj - yjl) d Vj 

< Ap 2 u 2 d [ e -^3 {lyil+lxi - yil) d Vl n / e-^ ilyl+ ^- yl) dy. 



These integrals are computable: for any /3 > 0, 



-P(\y\+\xi-y\) 



\yi-xi\>h 



dy 



1 g-cosh(/3a;i) if |ari| < ^, 



e-^lf^ + l^l-i) if \ Xl \>l 



and 



/ 

Jr 



-P(\v\ + \*i-v\) 



dy = e 



-0\xj\ 
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Therefore, when d > 2, setting j3 = j/(ayd), 

5>*-v(*)) a 



< Ap 2 u 2 d 



P 



+ 2 £< 



-fSk 



k=l 



1-e-P , 1 



Y,z~ m ( \k\ + 



d-1 



and the bound (C.7), which involves the term (C.6), follows after a straightforward 
calculation. 



Suppose that v < 1/2. From (C.3) we deduce 

h*h(x)<^2 2 ~ 2 ^ [ lly/af-ill^-^/ar-le-^W+II^IDdy. 



v 



If Hxlloo > 1/2, then ||x|| > 1/2, and so \\x\\ v -V 2 < 2 1 / 2 ^ and 
h*h(x) < -^-,2 2 ' iu a 1 - 2u 7r I e -^CU-//ll-ll - -.vll . 



dy. 



v 



The latter integral may be bounded similarly as the one in (C.8), and thereby (C.7) 



which involves the term (C.4), follows. 



□ 



Note that the inequality (C.7) reduces to an equality in the particular case d = 1 



and v = 1/2. Finally, the plots in Figure [9] confirm that for reasonable values of p, 
v, and a satisfying (5.10), the error bound ( |C.7 ) is small. 



D Alternative approximation of the density 



Let S C M be a compact set. In this appendix, in addition to Assumption 5.2 



we assume the slightly stronger condition that the spectral density <p is strictly less 
than 1. This ensures that all eigenvalues are strictly less than 1 for all index k so 



that the density / in Theorem 4.1 is well defined. Recall that / is given in terms of 



C and D, cf. (4.3). Below we introduce computationally convenient approximations 



of C and D which can be used with (4.3) to obtain an approximation of /. 



D.l Convolution approximation of / 

We start by showing that C^pp^ given by 

oo 



(d.i; 



k=l 



is well-defined, where 



C* 1 ^) = C (u), C* k (u) = I C* (k ^\x)C (u -x)dx, u e R d , k = 2, 3, . . . . 

(D.2) 
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0.00 0.02 0.04 0.06 0.08 0.00 0.01 0.02 0.03 0.04 0.000 0.005 0.010 0.015 



p = 100, v = 0.1 



p = 100, v = 0.5 



p = 100, v = 3 




0.04 0.000 



p = 500, v = 0.1 



p = 500, v = 0.5 



p = 500, v = 3 



Figure 9: Error-bound (C.7) in terms of a for different values of p and z^, when d = 2. 



The dotted line represents the maximal possible value of a following from (5.10). 
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Since < ip < 1 and <p G L 1 ^), for all p G [l,oo], we have <p G L p (R d ). Define 
<p = (p/(l- <p). For any u G R d , (p(u) = lim^oo <p n (u), where <p n (u) = Y2=i ( p(u) k . 
We see that (p G L 1 (R d ) since 



/OO p oo « 

^( M )d M = ^ / ^dti^^iMi^- 1 / 

k=l k=l 



(p(u) du 



mh 



< oo 



using the monotone convergence theorem to swap summation and integration to 
obtain the second identity. Therefore J 7-1 ^ is well-defined. Using the dominated 
convergence theorem and similar arguments as above, we see that (J 7-1 ^) (u) is 



equal to the right hand side of (D.l). 



For x,y G S, we define C app (x,y) = C app fi(x — y) and use the approximation 
C(x,y) ~ C app (x,y). The expansion ( |D.l ) corresponds to (4.6) with Cg(x,y) sub- 
stituted by Co fc (x — y). 



Using the same substitution in (|4.5|) leads us to approximate D by 



app 



\S\J2c* k (0)/k. 



(D.3) 



k=i 



Since CQ fc (0) = J ip(u) k du, we obtain an alternative expression for D app by applying 
the monotone convergence theorem, 



D 



app 
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log(l — <p(u)) du = \S\ / log(l + (p{u)) du 



Then the convolution approximation of / is defined by 

/a PP ({zi, • • • , x n }) = expd^l - D app ) det[C a pp](a;i, ...,x r , 



(DA) 



As mentioned above, the approximations C app and D app involve approximating 
Cg(x,y) by C^iu), where u = x — y. In fact the approximations provide upper 
bounds, since Cg(x,y) < CQ k {u) for all x, y and k. Heuristically, when approxi- 
mating Cg(x,y) by C^iu), we expect that the relative error increases as k grows, 



since the approximation is applied iteratively, cf. (4.4) and (D.2). However, the final 
approximations C ~ C app and D ~ D app involve sums of Cg(x, y) and Cg fc (-u), and 



the terms with a large relative error may only have a small effect if Cq (u) tends 
to zero sufficiently fast for k — > oo. Since C^ k is a covariance function, we have 

CQ k {u) < Cg fc (0) for all k = 1,2, Consequently, we expect that the accuracy of 

approximating / by / app depends on how fast CQ fc (0) tends to zero. This is further 
discussed in the examples below. 



D.2 Examples 



To use the density approximation / app in practice we truncate the sums in (D.l) 
and (D.3), i.e. 



N 



N 



C appfi (u)*J2C* k (u) and D app ^\S\J2^ k (0)/k 



k=l 



k=l 
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where N is a positive integer. Furthermore, we need closed form expressions for 
Cg fc (n). For the normal variance mixture models presented in Section 5.2, we have 



CQ k {u) = (p / p max ) k ] h* k (u) , and so it suffices to find closed form expressions for h* k . 
For the Gaussian model, 

h* k (u) = (k7ra 2 )~ d/2 exp(-\\u/a\\ 2 /k), u e R d , 

while for the Whittle-Matern model, 

h*"(u)= l ) U J a t KAhM L , u £ M. d , 



2»'- 1 {^a) d T{v' + d/2)' 



where v' = k{y + d/2) — d/2. We have no closed form expression for the Cauchy 
model. 

For both the Gaussian and the Whittle-Matern covariance function, h* k (0) de- 
cays as k~ d l 2 when k — > <x>, and therefore the rate of convergence of C , apP! o(0) = 
X]fcli(p/Pmax) fc ^*' c (0) depends crucially on d and p. For d < 3, the series only 
converges if p < p max , and the series converges slowly when p is close to p max - 




Figure 10: Comparison of using the convolution and periodic density approximations 
to approximate the log-likelihood of the Gaussian model as a function of a based 
on a simulated dataset in the unit square with p = 200 and a = 0.02. 



Based on a simulated point pattern in the unit square, Figure 10 compares the 
approximations obtained using the convolution and periodic density approxima- 
tions to approximate the log-likelihood for the Gaussian model with p = 200 and 
a = 0.02. The simulated point pattern has p = 213 points. In the likelihood calcu- 
lations, p = p is fixed such that the only varying parameter is a 6 (0, a max ), where 
ctmax = = 0.39. For both approximations, the truncation iV was increased 

until almost no change appeared in the approximations. In this example, iV = 256 
for the convolution approximation and iV = 512 for the periodic approximation. As 
in the simulation study in Section 6.3, the periodic approximation is giving effec- 
tively unbiased estimates. However, similar simulation studies (not reported here) 
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using the convolution approximation yielded estimates of a which were positively 



biased, which is in agreement with Figure [10] In particular using the convolution 
approximation we get a large proportion of estimates with at = a max - For smaller 
values of a, a < a max /2 say, the two approximations are very similar, and in this 
case p/p m ax < 1/2, so the convolution approximation converges rapidly, and in this 
range of a-values, a truncation of iV = 10 is sufficient to obtain stable results. This is 
computationally much faster than using the periodic approximation with N = 512, 
and therefore the convolution approximation is appealing when p/ p max is small. 
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